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Abstract — In this paper we present decision-feedback dif- 
ferential detection (DF-DD) schemes for autocorrelation-based 
detection in impulse-radio ultra-wideband (IR-UWB) systems, a 
signaling scheme regarded as a promising candidate in particular 
for low-complexity wireless sensor networks. To this end, we first 
discuss ideal noncoherent sequence estimation and approxima- 
tions thereof based on block-wise multiple-symbol differential 
detection (MSDD) and the Viterbi algorithm (VA) from the 
perspective of tree-search/trellis decoding. Exploiting relations 
• well-known from tree-search decoding, we are able to derive the 

■ novel decision-feedback differential detection (DF-DD) schemes. 
, A comprehensive comparison with respect to performance and 
, complexity of the presented schemes in a typical IR-UWB 

■ scenario reveals — along with novel insights in techniques for 
' complexity reduction of the sphere decoder applied for MSDD — 
\ that sorted DF-DD achieves close-to-optimum performance at 
. very low, and in particular constant receiver complexity. 



I. Introduction 

ULTRA-WIDEBAND (UWB) transmission systems are 
widely regarded as a promising teciinique for short-range 
, applications like wireless sensor networks (WSNs) [I], as the 
' relatively large signaling bandwidth enables a reduced transmit 
, power spectral density, coexistence to established narrow-band 

■ systems, and supports a large number of simultaneous users. 
, In particular, impulse-radio UWB (IR-UWB) is especially 
' well suited for WSNs due to its robustness to severe multi- 
path fading even in indoor environments, the potential to 

' provide accurate localization, and, last but not least, due to its 
, low cost and complexity Q. Moreover, commonly in WSNs 

■ information is transmitted in relatively short bursts and only 
, low data rates have to be supported, such that intersymbol 
' interference can easily be avoided. We denote the burst length 
!by A^. 

Avoiding costly channel estimation, low-complexity IR- 
, UWB receivers rely on noncoherent detection such as energy 

■ detection in the case of pulse-position-modulated IR-UWB, 
or autocorrelation detection in the case of pulse-amplitude- 
modulated IR-UWB 131, cf., e.g., (differential) transmitted- 
reference 0|. In particular, an autocorrelation receiver (ACR) 
enables conventional symbol-wise differential detection (DD) 

The performance of ACR-based DD (in terms of the 
signal-to-noise ratio (SNR) to guarantee a desired bit error rate 
(BER)), however, suffers a large gap compared to idealistic 
detection assuming perfect channel estimation. This gap can 
be bridged to a large extend, when jointly deciding for the 
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best sequence of the N symbols within the burst based on 
correlations of the receive signal ranging over the entire burst 
interval, i.e., employing an A^-branch ACR |^. However, for 
large burst length, this ideal noncoherent sequence estimation 
imposes a very high complexity burden, as it a) requires 
delaying and correlating the receive signal over the entire burst 
interval, and b) exhibits a very high computational complexity 
to find the optimum sequence (exponential in N). Hence, 
reduced-complexity detection schemes — still achieving close- 
to-optimum performance — are requested. 

In this paper, we focus on detection schemes employing a — 
still extended, but reduced-complexity — L-branch ACR, where 
L <^ N. We first review two well-known techniques, and 
show how both are connected to ideal noncoherent sequence 
estimation from the perspective of tree-search decoding ||6l. In 
particular, we consider block-wise multiple-symbol differential 
detection (MSDD) ||5l employing the sphere decoder (SD) 
I?] in combinations with techniques for complexity reduc- 
tion, e.g., m, and detection based on the Viterbi algorithm 
(VA) Q. The drawback of both methods, despite of their 
good performance, is that their computational complexity is 
relatively high and in the worst case increases exponentially 
with the blocksize or memory length, respectively. The main 
contribution of this paper is to exploit the well-known relation 
of decision-feedback detection as an approximation of tree- 
search decoding ||6l, ||9l, ifTOl . In doing so, we are able to 
transfer the concept of decision-feedback differential detection 
(DF-DD) HI], III, m to ACR-based detection of IR-UWB, 
yielding a computational complexity only linear in L. Similar 
approaches have successfully been applied, e.g., in the area 
of differential space- time modulation llT4l . ifTSl . As known 
from multi-antenna systems, the performance of decision- 
feedback detection can be improved when decisions are taken 
in an optimized order ITSl. A comparison with respect to 
performance and complexity of the presented schemes allows 
us to conclude that such sorted variants of DF-DD for IR- 
UWB achieve close-to-optimum performance at very low, and 
in particular constant receiver complexity, thus realize a very 
good performance-complexity tradeoff. 

Noteworthy, besides ACR-based detection there are other 
promising non-autocorrelation-based approaches to IR-UWB 
detection, such as the related approaches based on a decision- 
directed ACR [17] and on crosscorrelations with iteratively 
generated reference templates ifTSl . or approaches exploiting 
the sparsity of the UWB propagation channel via compressed 
sensing lfT9l . Il20l or RAKE reception employing a reduced 
number of fingers ||2TI . 

This paper is organized as follows: in Sec. the system 
model of IR-UWB and the ACR front-end are described. The 
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discussion of ACR-based detection schemes in Sec. |III] starts 
with ideal noncoherent sequence estimation, followed by two 
approximate methods based on MSDD and the VA, and is 
concluded with the presenation of the novel DF-DD schemes. 
A summary of the complexity of these detectors allows us to 
conduct a comparison of the presented schemes in Sec. |IV] 
We conclude with final remarks in Section |V] 

II. IR-UWB System Model 

A. Receive Signal Model 

Throughout this paper, we consider transmission of binary 
pulse-amplitude-modulated IR-UWB in bursts of N informa- 
tion symbols. The receive signal is then given as 

N 

r{t)^Y.^,p{t-iT) + n{t) , (1) 

where hi G {±1}, i = 0, N, are + 1 transmit symbols, 
which represent N encoded information symbols e {±1}, 
k = 1,...,A^, and T is the symbol duration. Differential en- 
coding is assumed, such that bi ~ bi^iUi = ^'onfe=i '^k, with 
the reference symbol bo — 1, but equivalent encoding rules, 
e.g., multiple-symbol transmitted-reference (MSTR) ll22l . are 
also possible. The overall receive pulse shape p{t) results from 
the convolution of transmit pulse, receive filter, and channel 
impulse response; its energy is normaUzed to one, thus, the 
energy per bid is given by E'b = 1. n{t) is white Gaussian 
noise of two-sided power-spectral density Nq/2, band-limited 
by the receive filter. To preclude intersymbol interference, the 
symbol duration T is chosen sufficiently large, such that each 
pulse has decayed before the next pulse is received. 

Note that the problem of timing acquisition and the usually 
applied frame structure used for time-hopping and code- 
division multiple access IJl, 123)1 are not explicitly taken into 
account, as the latter can be regarded as additional linear 
block coding, or averaged out prior to further receive signal 
processing, cf., e.g., ID, Q. 

B. Autocorrelation-Based Detection 

The core-ingredient of all investigated schemes for IR-UWB 
signal detection is the analog (or sufficiently sampled) front- 
end depicted in Fig. [T] the so-called L-branch ACR. For the 
i-th symbol interval, it computes the correlation coefficients 
(l = l,...,L) 

Z,-i,^^ [ r{t+(i-l)T)r{t + iT) dt (2) 
Jo 

of the receive signal in the z-th and the L preceeding symbol 
intervals. The integration interval T; (< T) of the ACR is 
a receiver parameter, which can be adapted to the channel 
characteristics at hand (cf., e.g., Q). 

Demanding the channel to remain constant over an interval 
of L + 1-symbols, an L-branch ACR provides information 
on the relation of the current symbol to the preceeding L 

' Note that the energy for the first reference symbol is neglected, as typically 
relatively long bursts are considered. 




Fig. 1 . Block diagram of an L-branch ACR. 

symbols. The phase transition from to bi is superposed 
by an "information x noise" and "noise x noise" term, i.e., 

Z^^i^i = bi^ibi p^{t) dt + rii^i,i (3) 
Jo 

where ?7i_;.i collects all terms corrupted by noise. 

Difficulties in hardware implementation of the ACR may be 
regarded as a question of technology. In particular the realiza- 
tion of accurate analog delay lines remains a demanding task, 
cf., e.g., 131, l24l . but advances in speed of A/D converters 
1241 . Il25l will soon solve this problem. 

III. Signal Detection 

In the case of a single-branch ACR {L = 1), using 
o-i = bibi^i, from Q it can be seen that symbol-wise 
DD is performed, such that the information symbols are 
directly obtained as af'-' ~ sign(Zi_i i). In the case of an 
extended ACR {L > 1), there are various methods how 
to finally decide the transmit symbols based on the ACR 
output; these are described in the following. Beginning with 
ideal noncoherent sequence estimation (INSE), we discuss 
approximations thereof based on MSDD and the VA. From 
this detailed, but unified treatment, we are not only able to 
straightforwardly derive novel DF-DD schemes for IR-UWB 
detection under IIII-DI exploiting relations well-known from 
tree-search decoding 0, ||9l, ITOl . but also to conduct a 
comprehensive comparison with respect to performance and 
complexity in Sec. |IV] This summarizes, along with novel 
insights in techniques for complexity reduction of the SD for 
MSDD, the main contribution of the paper. 

A. Ideal Noncoherent Sequence Estimation (INSE) 

First, recall that ideal noncoherent sequence estimation 
(INSE) would jointly decide for the best sequence of iV + 1 
symbols, taking into account the receive signal in the entire 
burst interval <t < {N+1)T. As the statistics of the receive 
pulse shape p{t) are unknown, according to generalized- 
likelihood ratio testing (GLRT) an explicit optimization over 
this unknown parameter is included l26l . Collecting the trans- 
mit symbols in a vector, this results in solving Q, l27l 

bG{±l}" + i, bo = l i=l V /=0 / 

where the statistics Zi^i, i = l,...,iV, I = 0,...,i — 1, are 
obtained from an -branch ACR. It is evident that, for large 
N, this is infeasible for two reasons: a) correlations of the 
receive signal over time delays of NT have to be performed. 
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which requires accurate delay lines over possibly hundreds of 
symbols, making hardware implementation impossible, and b) 
the computational complexity of finding the best sequence is 
exponential in TV, and thus intractable, since — at least at worst 
case — any algorithm must perform an exhaustive search over 
all 2^ possible sequences. 

Hence, methods which are based on an ACR with only 
L <^ N branches and reduced computational complexity — at 
best linear in L — are called for To motivate these techniques, 
note that INSE for binary signaling can be viewed as a tree 
search problem in a binary tree of depth N, with the optimum 
sequence given by the path from the root to the leaf with 
maximum path metric. 

B. Multiple-Symbol Differential Detection (MSDD) 

One possible approximation of the INSE search problem is 
to spUt the binary tree of depth N into smaller subtrees of 
depth L and to solve each one independent. This corresponds 
to splitting the burst of N symbols into smaller blocks of 
L symbols, and perform block-wise MSDD, based on the 
receive signal only in the corresponding interval. To this 
end, the burst of + 1 receive symbols is decomposed into 
N/L blocks of i + 1 symbols = [6kL, &kL+i, ^kL+l], 
K = 0,1,..., N/L, each representing L information symbols, 
which, due to the differential encoding, overlap by one sym- 
boS 

The decision metric of block-wise MSDD is directly ob- 
tained from the INSE decision metric when restricting it to 
the corresponding block intervals. However, to facilitate the 
application of tree-search decoding algorithms, a constant 
is subtracted similar to Q, yielding, with argmax(a;) = 
argmin(— a;), exemplarily for the first block. 



b^SDD ^ MSDD(Z, i^stop) 



.MSDD 
^0 



argmm 

be{±l}^ + \ bo 




bi hi Zi 



(5) 



The required statistics to solve (|5]l, Zi,i, can be obtained from 
an ACR with only L {<€. N) branches. 

Due to the reformulation, the i-Xh increment of the decision 
metric. 



E 

1=0 



bi hi Zi 



(6) 



is always non-negative and solely depends on the i preceeding 
symbols fe;, Z = 0,l,...,i — 1. This allows to check the decision 
metric componentwise, and thus fits into the framework of 
general tree-search decoding and in particular enables the ap- 
plication of the SD Q, ll28l . Note there are further options to 
approximately solve © efficiently, e.g., based on relaxations 
of the search problen]f ED, 1291 . 

Clearly, for L — N, INSE is obtained; for the case of L = 1, 
i.e., the decision of a single information symbol, block-wise 
MSDD reduces to traditional DD. In Il30l it has been observed 

^If necessary the final block length is reduced to Lf = N mod L. 

'^These approaches are not considered in the latter comparison, as they 
require operations of significantly higher complexity compared to the pre- 
sented schemes, such as solving a semi-definite program in 1221 or calculating 
dominant eigenvectors in 1291 . 



R ■- +00- Ao ■- 
bo := 1; i — 1 

= sign(pj); := 1 
while i > { 

A; := Ai_i + - biPi 

it Ai<R{ 
if i < L { 

i:=i + l 

Pi ■= Yll^o ^'>* 1^ ■~ 5I1I=() 
hr ;= sign(p,); ;= 1 

} else { 

b"S^° := b; R ~ A, 

if 7? < Rstcp { break and return 6'^^°° } 

i := i — 1 

while Tii > 1 { i := i — 1 } 

bi :— —hi; rii :— rii + 1 

} 

} else { 

i := i — 1 

while rii > 1 { i :— i — 1 } 

bi ■= —bi, m ~ rii + 1 

} 



Fig. 2. Pseudo-code representation of the SD algorithm for MSDD of IR- 
UWB. 

that using blocks overlapping by more than one symbol, so- 
called subset MSDD, yields further gains in performance at the 
cost of complexity. Due to lack of space this is not considered 
in this paper 

We briefly review SD-based MSDlfl employing the 
Schnorr-Euchner search strategy, at some node at depth i — 1 
the SD chooses the branch labeled by bi with minimum branch 
metric. As the SD operates on the transmit symbols, using (|4|l 
and this is directly given as 



bi = argmin Si 
bie±i 



argmax hi hi Zi ^ 
f>,e±i 



(=0 



i-l 
1=0 



(7) 



The tree is only extended along this branch, if the partial 
decision metric ^^^^ '^^an the search radius R. 

At the beginning this search radius can be chosen arbitrarily 
large, but is updated whenever a new (preliminary) best 
block is found. The SD algorithm for MSDD of IR-UWB is 
summarized in pseudo-code representation in Fig. |2] (including 
techniques for complexity reduction as described below). For 
brevity we defined Si = qi — hipi, and the symmetric matrix 
Z e Ri+ixi+i with elements Zi,, = Z,^i, i,l = 0, ...,L. As 
it does not influence the decision we may force the diagonal 
elements to be Zi^i = 0. Exemplarily, for L = 2 we have 



(8) 



We consider three techniques to speed up the SD search 
process, two of which are presented here, the third is presented 
along with DF-DD in Sec. IIII-Dll 

■^cf. Q for details, but note that in contrast to the presented SD operates 
on the transmit symbols hi rather than on the data symbols , yielding certain 
benefits as described below. 






Zo.i 


Zo.2 


Zo.i 





Z\.2 


Zo,2 


Z\.2 
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1) SD stopping radius: The search process is terminated 
early if the metric of any preliminary sequence during the 
SD search process is less than a precomputed stopping radius 
^stop, cf.. Line 14. In fS] it has been shown that choosing the 
stopping radius as 



R. 



stop 



L • mill \Zi^, 



(9) 



preserves the optimality of the SD output. 

2) Initial SD search radius: Instead of choosing the initial 
SD search radius arbitrarily large (cf.. Line 1), it may be 
chosen to any good estimate. This can, e.g., be the decision 
metric of the DD sequence. In this case, if the SD does not 
find a better sequence, b'^^'^'^ = b'^'^. If already the metric 
of the DD sequence meets the above stopping criterion, a SD 
call is not necessary at all. 

C. Viterbi Algorithm 

Another technique to approximate INSE, suggested in ||7], 
employs the Viterbi algorithm (VA). Note that, in the case 
of INSE, the memory length increases linearly from 1 to N . 
To enable the implementation of the VA, the memory length 
is truncated to a maximum of L. Again taking the view that 
INSE is a search in a binary tree of depth N, this results in 
nodes, which may be assumed to be equivalent starting from a 
depth greater than L. Merging these nodes, a trellis structure 
with a total of 2^ states is obtained. This procedure is in the 
spirit of delayed decision-feedback sequence estimation |f9l, 
ifTOl . The path metric for the VA is obtained from the INSE 
metric (|4]i by restricting the memory length to L, i.e.. 



N 



A 



VA 



kZi, 



(10) 



1=1 



t(0,i 



As usual for the VA, the final estimate is the sequence with 
maximum path metric. The VA ensures a fixed complexity 
(exponential in L, but linear in N). Again, an L-branch ACR 
is sufficient. Depending on L, the VA will tradeoff between 
DD {L = 1) and INSE {L = N); thus, for L = N, MSDD 
and the VA are equivalent. 

D. Decision-Feedback Differential Detection (DF-DD) 

The VA, as well as block-wise MSDD in the worst case, 
have a complexity in the order of 2^, i.e., exponential in the 
memory length or blocksize, respectively. It is desireable to 
have a complexity linear in L. This can be achieved using 
the principle of decision-feedback differential detection (DF- 
DD) lim . IIT2I . IIT3I . There are essentially two variants of 
DF-DD for IR-UWB, both operating on the output of an 
i-branch ACR: block-wise DF-DD being closely related to 
block-wise MSDD, and continuous DF-DD being related to 
the VA implementation. 

1) Block-Wise DF-DD (bDF-DD): Block-wise DF-DD is 
directly obtained from SD-based block-wise MSDD lH. The 
Schnorr-Euchner search strategy in the SD for MSDD ensures 
that the first estimate in the SD search process equals DF-DD. 
Thus, terminating the SD after the first point found, results 



in DF-DD with a linearly increasing feedback window length 
(from 1 to L). This is achieved, e.g., by calling the SD with 



R 



stop 



= CO, or, equivalently, choosing fep'-"^''-"-' — 1, and, 



similar to (|7]i. 



.bDF-DD 



sign^Zi 



(11) 



1=0 



It is well known — especially from DF equalization in multi- 
antenna systems, also known as BLAST ||6l, ifTSI — that taking 
the decisions in an optimized order, i.e., employing some 
sorting, improves the performance. Similarly, in the context 
of IR-UWB, interchanging the decision order within a block 
is enabled through the block-wise processing of bDF-DD (then 
labeled sorted block-wise DF-DD (sbDF-DD)). Interchanging 
the decision order can easily be achieved by reordering the 
columns and rows of Z acc. to some sequence (iq, ii, ii), 
ik e {0, L}, ik ^ ii for fc 7^ L 

A reasonable sorting criterion can be derived from the DF- 
DD process itself. For reliable decisions in each step the 
magnitude of the argument of the sign-function in (fTTT i is 
desired to be as large as possible. Hence, with iq — 0, 
jjSbDF-DD _ -|^^ jjjg gj-gj. (jgcijjed symbol should be the ii- 

th symbol, where ii = argmax^^j ^ \Zo^ib^^^'^^\. Taking 
the previous decision into account, the symbol which can be 
decided most reliable next can be found successively from 



Ik — argmax 

4e{i,...,L}/{n,...,ifc_i} 



fc-i 

E 

1=0 



5!t>DF-DD 



and its value reads 



^bDF-DD 



=-gn(E^^,/r™) 
1=0 



(12) 



(13) 



where k = 1,...,L. Basically, this sorting criterion forces 
reliable decisions for the first decided symbols, which then 
strongly influence the upcoming decision^ It has to be noted 
that in contrast to BLAST, sorting is done per block based on 
the actual receive symbols and taking the previous decisions 
into account, rather than on the channel realization. 

Noteworthy, the special case of sorted block-wise DF-DD 
and L = 2 is equivalent to MSDD, i.e., b^^°™ = b^^DD 
To proof this, assume Z is sorted, thus \Zq,i\ > \Zq,2\ holds 
(cf. (lU). Since sbDF-DD chooses bf°^-°° = 1, 6f df-dd ^ 
sign(Zo,i), and 



7SbDF-DD 



sign(Zo,2 + sign(Zo,i)Zi,2) 



the MSDD metric evaluates to 



A = \Z, 



0,2 



\Zi-2\ - \Za,2 + sign(Zo, 1)^1^2! 



If sign(Zo,2) = sign(Zo.iZi,2), A = 0. Otherwise, either A = 
2|Zi,2| or A = 2|Zo.2|, depending on whether \Zi^2\ < \Z\.2\ 

Different sorting criteria are also possible, e.g.. the ii-/'cx)-norm (col- 
umn/row norm are equivalent) of the matrix Z, or acc. to the first row of 
Z. However, we have found that all show some loss compared to successive 
sorting during the DF-DD process (up to 1 dB for the /00-norm and the first- 
row criterion, and only marginal loss for the /i-norm). 
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or vice versa, respectively. In any case the minimum possible 
MSDD metric A G {0, 2 min results. 

A similar sorting step may also be used as a preprocessing 
step in the case of MSDD using the SD, however as no 
previous decisions are available, ( fT2b is upper bounded using 
the triangular inequality. Note that sorting is only possible 
as the SD operates on the transmit symbols. Since the SD 
starts with the DF-DD sequence, using a sorted SD input — and 
correspondingly reordering the output — delivers an improved 
first preliminary sequence and correspondingly updated search 
radius, thus speeds up the SD search process. Note that sorting 
of the SD input — among different preprocessing steps — is a 
well known technique for SD complexity reduction ||6l, ll28ll . 

ED. 

2) Continuous DF-DD (cDF-DD): Another variant of DF- 
DD with a symbol-wise processing (sliding window) can be 
derived from the VA making use of the relation of the VA and 
DF-DD, which has been established in ||9l, ifTOl : in this view 
DF-DD corresponds to a reduced-state sequence estimation 
with only a single state. For cDF-DD of IR-UWB, in contrast 
to (fTTI ). now L previous decisions are fed back to improve the 
decision of the current symbol, thus, fixing the memory length 
to L as in the VA. In detail, cDF-DD chooses 

6?™ = sign E ^',«&?™- (14) 

/— max(0, i — L) 

Due to the symbol-wise processing, sorting is not applicable 
for cDF-DD. The transient behavior at the beginning of the 
stream leads to cDF-DD and bDF-DD being equivalent when 

L = N. 

IV. Comparison 

In this section, we compare the presented IR-UWB detection 
schemes in terms of performance and complexity. We first 
define the complexity measure adopted in this paper and 
then assess the performance-complexity tradeoff via numerical 
results. 

A. Complexity 

Since all schemes (apart from INSE, which only serves as a 
reference) are based on the output of the same L-branch ACR, 
we focus on the computational complexity of the decision 
unit. Due to binary signaling, all multiplications (e.g., in 
(|5]l) are limited to sign-inversions, thus, assuming a suitable 
number format, such that sign-inversion and the sign(-)- 
and I • I -operation require negligible complexity (e.g., two's 
complement), the main source of computational complexity 
of SD-based MSDD, the VA, or variants of DF-DD is the 
number of real-valued additions (adds). 

Due to the triangular structure, block-wise DF-DD performs 
(L — l)/2 adds per symbol, while continuous DF-DD requires 
(L — 1) adds per symbofl, both having a complexity linear in 
L. The VA performs 2L adds per state, thus, in total 2L ■ 2^ 
adds per processed information symbol, yielding a complexity 

*For simplicity of implementation we neglect the edge effects in cDF-DD 
and the VA. 



exponential in L. Concerning SD-based MSDD, the number 
of real-valued adds of the SD search process depends on 
the realization of Z and in particular on the SNR. It ranges 
from 2L ■ 2^ adds per block in the worst cas^ (per symbol 
exponential in L), to a minimum of i(L + 1) — 1 adds per 
block in the best case (per symbol linear in L). The latter 
occurs when the first path found during the SD search process 
fulfills the stopping criterion, i.e., the SD only computes the 
decision metric of one particular sequence step by step. The 
same number of adds is required to find an initial search radius 
for the SD based on the MSDD decision metric of a particular 
sequence. In the case of sorted DF-DD, sorting does not add to 
the overall complexity, as sorting is done successively based 
on a similar expression as required for taken the decisions 
(arguments of (fTZb and ( fT3l l are equal). However, if sorting 
is applied as a preprocessing step of the SD for MSDD, 
calculating the optimized order increases the complexity by 
(i — l)/2 — 1 adds per block of L symbols. 

With the argumentation above, the complexity of DD, of 
finding the stopping criterion (|9]i, and of the final differential 
decoding step to obtain the information symbols from the 
estimated transmit symbols may be neglected. 

B. Numerical Results 

For all numerical simulations, a typical IR-UWB scenario 
has been considered: the transmit pulse shape is chosen as a 
Gaussian monocycle with 2.25 GHz center frequency and a 
bandwidth of 3.3 GHz (measured at 10 dB), the propagation 
channel is modeled acc. to IEEE-CM 2 | [32| (constant over the 
burst interval and each realization normalized to unit energy), 
and the receive filter is matched to the transmit pulse shape. 
We assume no intersymbol interference (T chosen sufficiently 
large), and, for this setting, T\ = 30 ns is a good compromise 
for the integration time of the ACR. All results have been 
averaged over a large number of bursts. 

First, Fig. |3] depicts the BER for short bursts with = 2, 5, 
and 10 symbols, where INSE can be realized by an A^-branch 
ACR in combination with SD-based MSDD (all variants — 
sorted/non-sorted, with/without initial or stopping radius — 
have the same performance, and differ only in complexity). 
INSE results in gains of about 4dB over traditional DD for 
N = 15. Even for the relatively large feedback length of 
L = N = 15, DF-DD (block- wise and continuous processing 
are equivalent for L — N) without sorting does not lead to 
significant gains vs. DD. This is due to the linearly increasing 
feedback window length from 1 to L, such that for the decision 
of the first decided symbols only few decisions are fed back. 
The performance of DF-DD is tremendously improved, when 
the decision order is optimized as described under Sec. IIII-Dll 
yielding close-to-optimum performance for L > 2, and, as 
shown under Sec. IIII-Dll exactly the same performance as 
MSDD, thus here also INSE, for the special case of i = 2. 

'in the worst case the SD searches the entire tree. However, due to 
the Schnorr-Euchner search strategy, the involved metric calculations are 
performed in an efficient way, cf. Fig. [2] such that only J2fji 2'(|2 + 
i(2(i - 1) + 2)) + ■ i(2(L - 1) + 2) = L2^+^ adds are required. 
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Fig. 3. BER performance vs. E-^^/Nq of DF-DD (sorted and non-sorted) in 
comparison to ideal noncoherent sequence estimation (INSE), DD, and ideal 
coherent detection of IR-UWB for short bursts (A^ = L) with L = 2 (o), 
L = 5 (X) and L = 15 (□). IEEE-CM 2, T; = 30 ns. 
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Fig. 5. Histogram of the complexity (in adds per symbol) of block-wise 
MSDD IR-UWB detection (with/without initial search radius, sorted/non- 
sorted), in comparison to block-wise (sorted/non-soited) and continuous DF- 
DD at 10 log (i?b/A^o) = 10 dB for L = 10. Crosses: average complexity. 
IEEE-CM 2, Ti = 30 ns. 
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Fig. 4. BER performance vs. E\^/No of DF-DD (block-wise with sorting 
and continuous) in comparison to block-wise MSDD, VA-based detection, 
DD, and ideal coherent detection of IR-UWB for bursts of N = 100 with 
L = 2 (0), and L = 10 (x). INSE is approximated by sorted DF-DD with 
L = 100. IEEE-CM 2, I- = 30 ns. 



In the case of a larger burst length (N = 100), as considered 
in Fig.|4] INSE becomes impracticable due to the high compu- 
tational complexity and the required A^-branch ACR. For ref- 
erence, due to the high complexity its performance is approx- 
imated by (sorted) DF-DD with L = N = 100. Naturally, the 
presented reduced-complexity detection schemes, employing 
only an (L < A^)-branch ACR, i.e., block-wise MSDD, VA- 
based detection, and the DF-DD schemes, show increasing loss 
compared to INSE for decreasing L. While block-wise MSDD 
(again all variants — sorted/non-sorted, with/without initial or 
stopping radius — show exactly the same performance) with 
a blocksize of L is clearly outperformed by the continuous 
approach of VA-based detection with a fixed memory length 
of L, block-wise DF-DD with sorting and a linearly increasing 
memory length from 1 to L is superior to continuous DF- 
DD with a fixed memory length. Thus, as known from other 
applications ||6l, |[T6l . in the case of DF-DD the sorting step, 
which is only applicable for block-wise processing, is crucial 
to achieve high performance with decision-feedback schemes. 



However, the VA — achieving best performance with an L- 
branch ACR — requires a significantly higher computational 
complexity compared to the other schemes, thus may be 
applied only in the case of very small L (say, for L < 3). 
For L = 10, Fig. |5] shows the complexity (measured as the 
number of adds per information symbol) of DF-DD and — 
due to the varying complexity — normalized histograms of 
the complexity of MSDD using the SD employing different 
combinations of the presented complexity reduction techniques 
(all use the packing-radius-based stopping radius, cf. ^ and 
JSj) at an operating point of 10 log (Eb/No) = 10 dB, yielding 
a BER w 10^'^. The complexity of the VA is orders of 
magnitudes higher (2L • 2^ = 20480 adds per symbol for 
L = 10) and is thus not included. Straightforward application 
of the SD for MSDD, employing neither an initial search 
radius, nor sorting of the SD input, in many cases requires only 
relatively few additions (in the order of DF-DD), but there 
is a high variation, yielding a relatively large average, and 
very high worst-case complexity (cf. tails of the histograms 
with >40 adds). Surprisingly, incorporating an initial search 
radius based on DD mainly results in an increased complexity. 
This is due to the fact that the increase in complexity of only 
calculating the MSDD decision metric of the DD sequence 
is not compensated by a sufficiently large search complexity 
reduction. Although the sorting step prior to the SD adds to the 
overall complexity, as well, it is more than compensated after- 
wards, yielding reduced average complexity and significantly 
less variation. Again, incorporating DD as an initial search 
radius mainly increases the complexity of sorted MSDD, such 
that we may conclude that MSDD employing a sorting step of 
the SD input and the packing-radius-based stopping criterion 
is the lowest-complexity variant among all SD-based variants 
for MSDD. Similarly, sorted block-wise DF-DD is clearly 
preferable to other variants of DF-DD as it shows superior 
performance at half the complexity of continuous DF-DD and 
equal complexity as block-wise DF-DD without sorting. 

Finally, Fig. |6] summarizes the tradeoff performance (in 
SNR to guarantee a desired BER) vs. complexity (in adds per 
symbol) obtained with the presented schemes, i.e., DD and 
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Fig. 6. Tradeoff performance vs. complexity at BER = 10~^ of IR- 
UWB detection using block-wise MSDD (sorted), block-wise (sorted and non- 
sorted) and continuous DF-DD, and the VA for different L in comparison 
to DD and INSE. Only MSDD: histogram of complexity indicated as a 
colorbai' (darker/lighter: higher/less occun'ence, average complexity indicated 
by crosses). Complexity of VA is orders of magnitudes higher (320 and 15-2^^ 
adds per symbol for L = 5 and 15). IEEE-CM 2, T, = 30 ns. 



the more sophisticated schemes making use of an L-branch 
ACR (DF-DD, VA, and sorted block-wise MSDD), for L = 5 
and L = 15, at an operating point of BER = 10~^ (thus 
compareable to X = 10 at 10 log [Ex, /Nq) = 10 dB in Fig. |5]l. 
The fixed and average complexity is indicated by markers in 
the case of (DF-)DD and MSDD, respectively; for MSDD, 
a colormap also indicates the histogram of the complexity 
(darker/lighter: higher/less occurrence). The complexity of the 
VA is orders of magnitudes higher than that of the other 
schemes (320 and 15 • 2^^ adds per symbol for L = 5 and 
15, respectively); INSE (again approximated by sorted DF- 
DD with L = N = 100) is not compareable in terms of 
complexity, as it requires an A^-branch ACR; thus, only the 
performance of both is indicated. 

The detection schemes are lined up from lowest com- 
plexity and worst performance in the case of DD (no adds, 
x), followed by block-wise (o) and continuous (□) DF-DD. 
Further performance gains, at however higher, and in particular 
varying complexity, is achieved using sorted MSDD (average 
complexity, +). The variation of MSDD complexity increases 
for increasing blocksize. Employing an L-branch ACR, best 
performance at fixed, but very high complexity, is obtained 
using the VA. The only exception to this strict line-up is block- 
wise DF-DD with an optimized decision order (sorted block- 
wise DF-DD, o), which achieves almost the performance of 
MSDD at significantly less complexity. 

From this comparison we conclude that block-wise DF-DD 
in combination with sorting enables a very good performance- 
complexity tradeoff, a result which should be viewed in 
particular in comparison to other recently presented close-to- 
optimum block-based detectors, cf. Il22l . 1291 and Footnote |3] 
Note that additionally scaling the stopping radius similar to fSl 
enables to smoothly switch between sorted MSDD and sorted 
DF-DD. 



V. Conclusions 

In this paper we have presented autocorrelation-based 
decision-feedback differential detection (DF-DD) schemes for 
IR-UWB systems. To this end, we reviewed multiple-symbol 
differential detection (MSDD) and detection based on the 
Viterbi algorithm in a unified way, frow which we derived 
the novel low-complexity DF-DD schemes, exploiting con- 
cepts well-known from tree-search decoding. A comprehensive 
comparison with respect to performance and complexity of the 
presented schemes in a typical IR-UWB scenario reveals — 
along with new insights in techniques for complexity reduction 
of the sphere decoder applied for MSDD — that sorted DF-DD 
achieves close-to-optimum performance at very low, and in 
particular constant receiver complexity. 
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